Natural Language Processing an Efficient Tamil Text Compaction System
نویسندگان
چکیده
Tamil is slowly becoming the online language and mobile text messaging languages for many Tamils around the world. Social networks and mobile platforms now extensively support Unicode and applications for keying Tamil text. The number of characters in a text message is limited in some social nets and mobile text messages. The need for compacting the text becomes essential as it translates to saving online storage space, cost and many more factors. The paper proposes a text compaction system for Tamil, a first of its kind in Tamil. The system proposed in this paper handles common Tamil words, acronyms/abbreviations and numbers. Morphological analyzer [1] and Morphological generator are used to stem inflexion words and replace them to compact using a mapping repository. The proposed work is tested with over 10,000 words and it is found that the final result is reduced to 40% of the original text. The paper concludes by discussing possible extensions to this system.
منابع مشابه
Template Based Multilingual Summary Generation
Summarization of large text documents becomes an essential task in many Natural Language processing (NLP) applications. Certain NLP applications deal with domain specific text documents and demand for a domain specific summary. When the essential facts are extracted specific to the domain, the summary proves to be more efficient. The proposed system builds a bilingual summary for an Information...
متن کاملGrammar Checker Features in Modern Tamil Natural Language Processing
Generally, The NLP (Tamil) applications are programming with different kinds of input data. Inputs classified into Text, Image, sound waves etc., Tamil Text based applications are creating under the word formation techniques. These words analysis and generation are activating in these ways, i) Untagging & Tagging and ii) Word-level and Character-level accuracies. This method is processing based...
متن کاملAutomatic Conversion of Dialectal Tamil Text to Standard Written Tamil Text using FSTs
We present an efficient method to automatically transform spoken language text to standard written language text for various dialects of Tamil. Our work is novel in that it explicitly addresses the problem and need for processing dialectal and spoken language Tamil. Written language equivalents for dialectal and spoken language forms are obtained using Finite State Transducers (FSTs) where spok...
متن کاملLexicalized and Statistical Parsing of Natural Language Text in Tamil using Hybrid Language Models
Parsing is an important process of Natural Language Processing (NLP) and Computational Linguistics which is used to understand the syntax and semantics of a natural language (NL) sentences confined to the grammar. Parser is a computational system which processes input sentence according to the productions of the grammar, and builds one or more constituent structures which conform to the grammar...
متن کاملA Novel Data Driven Algorithm for Tamil Morphological Generator
Tamil is a morphologically rich language with agglutinative nature. Being agglutinative language most of the word features are postpositionally affixed to the root word. The morphological generator takes lemma, POS category and morpho-lexical description as input and gives a word-form as output. It is a reverse process of morphological analyzer. In any natural language generation system, morpho...
متن کامل